Most of the current theory for dynamic programming algorithms focuses on finite state, finite action Markov decision problems, with a paucity of theory for the convergence of approximation algorithms with continuous states. In this paper we propose a policy iteration algorithm for infinite-horizon Markov decision problems where the state and action spaces are continuous and the expectation cannot be computed exactly. We show that an appropriately designed least squares (LS) or recursive least squares (RLS) method is provably convergent under certain problem structure assumptions on value functions. In addition, we show that the LS/RLS approximate policy iteration algorithm converges in the mean, meaning that the mean error between the appro...
Simulation-based policy iteration (SBPI) is a modification of the policy iteration algorithm for com...
We consider the discrete-time infinite-horizon optimal control problem formalized by Markov de-cisio...
We consider infinite horizon dynamic programming problems, where the control at each stage consists ...
Abstract — In this paper, we present a recursive least squares approximate policy iteration (RLSAPI)...
We present a new algorithm, called incremental least squares policy iteration (ILSPI), for finding t...
We consider finite-state Markov decision processes, and prove convergence and rate of convergence re...
In this paper we study a class of modified policy iteration algorithms for solving Markov decision p...
We consider approximate dynamic programming for the infinite-horizon stationary γ-discounted optimal...
This article proposes a three-timescale simulation based algorithm for solution of infinite horizon ...
In this paper we propose a novel algorithm, factored value iteration (FVI), for the approximate solu...
International audienceWe consider the infinite-horizon γ-discounted optimal control problem formaliz...
We consider the infinite-horizon γ-discounted optimal control problem formalized by Markov Decision ...
Solving Markov Decision Processes is a recurrent task in engineering which can be performed efficien...
We consider the problem of finding an optimal policy in a Markov decision process that maximises the...
In this paper we propose a novel algorithm, factored value iteration (FVI), for the approximate solu...
Simulation-based policy iteration (SBPI) is a modification of the policy iteration algorithm for com...
We consider the discrete-time infinite-horizon optimal control problem formalized by Markov de-cisio...
We consider infinite horizon dynamic programming problems, where the control at each stage consists ...
Abstract — In this paper, we present a recursive least squares approximate policy iteration (RLSAPI)...
We present a new algorithm, called incremental least squares policy iteration (ILSPI), for finding t...
We consider finite-state Markov decision processes, and prove convergence and rate of convergence re...
In this paper we study a class of modified policy iteration algorithms for solving Markov decision p...
We consider approximate dynamic programming for the infinite-horizon stationary γ-discounted optimal...
This article proposes a three-timescale simulation based algorithm for solution of infinite horizon ...
In this paper we propose a novel algorithm, factored value iteration (FVI), for the approximate solu...
International audienceWe consider the infinite-horizon γ-discounted optimal control problem formaliz...
We consider the infinite-horizon γ-discounted optimal control problem formalized by Markov Decision ...
Solving Markov Decision Processes is a recurrent task in engineering which can be performed efficien...
We consider the problem of finding an optimal policy in a Markov decision process that maximises the...
In this paper we propose a novel algorithm, factored value iteration (FVI), for the approximate solu...
Simulation-based policy iteration (SBPI) is a modification of the policy iteration algorithm for com...
We consider the discrete-time infinite-horizon optimal control problem formalized by Markov de-cisio...
We consider infinite horizon dynamic programming problems, where the control at each stage consists ...